InfoMagic Internet Tools 1995 April

home *** CD-ROM | disk | FTP | other *** search

/ InfoMagic Internet Tools 1995 April / Internet Tools.iso / infoserv / www / cern / dev / www-talk.9301-9306.Z / www-talk.9301-9306 / text0683.txt < prev next >

Wrap

Text File | 1995-04-24 | 2.7 KB | 70 lines

Logging user access is something which we definitely want ..for a number of reasons - Justifying the project by showing statistics - Demonstrating the readership profiles of different material - Demonstrating the usage profile across sites The privacy issue is very important, and so I had intended to log each action "A read B" as "A read something" and "B was read" independently. This would give the basic profiles. Anything futher would be an infringement of privacy, so yes that the user would have to agree to it. The problem is, then the sociological data would be immediatly filtered ... all the alt.sex.bondage readers would filter themselves out! Perhaps two levels are needed. The network load is also something which I considered a possible problem, so I decided on a scheme (have I said this before?) in which an event was logged with probability p=exp(-a*t) and the probability p is included in the message so that the message can be given weight 1/p in the analysis. The time t with which p decays is from compilation of the source, so you get more fine-grained info on the new releases. The messages would be UDP packets so as not to clog gateways. We have a monitoring service here which is already monitoring the use of other CERN software -- I am not sure whether it is tcp or udp based. *Coincidence:* As I write the file system on our server has JUST filled up in attempting to process server January's log data.... is this a warning?! BTW: Marc, you were going to log how LONG an article was read for. I think that is very tricky... if you can come up with a good measure of how much the person LIKED the article (automatically) then you will really have something. Someone whose name I forget in Stockholm just gave a talk about inferrding document affinities from readership profiles... using the user as a more refined text comparison program than a work occurence engine. I suggested WWW usage data as source, but realized that for example of all the talk I had just given with XMosaic, the document which was left on the screen for the longest time was quite irrelevant. Something linked with this is finding relevant material for a particular person. How about a service which takes someone's global history file and tells them all that's new in the world which would interest them? In other words, if you do keep data about a particular person, then that can help them find more data like it.... a sophisticated form of relevance feedback. - - - I think that as you are collecting data from the public, then the data should also be made available to the public, with names and addresses removed. Another possibility is that all servers keep logs and share the results... but it will always be incomplete. Tim